Segmentation of Credit Card customers to define marketing strategies.

Final Project

Group1

Joel Lautenschlager, Rabea Radman, Soyuz Shrestha, Sayali Dhore, Aron Firew

picture.png

Section 1:

1. Business Problem

1.1 Problem Description:

A manager at a credit card issuer is having difficulties meeting recruitment targets for new clients and is stumbling when attempting to retain current customers with new targeted services. If they can target their advertising, less revenue would need to be spent, and targeted campaigns could be utilized to get more results. This dataset is from Kaggle.

2. Business Question:

How can we group the clientele into semi-distinct groups for advertising campaigns?

3. Objective:

The data obtained from Kaggle, in one *.csv file has information on 8,950 clients, and contains 18 features. Of the features provided, six are ratios, and the remaining are continuous. In accordance with the issuer’s privacy policies, and any applicable legislation, the data contains no identifiable demographical data, and relates solely to activity in existing credit card accounts related to financial activity and relationship with the credit card issuer. The information we derive from this report will be of particular use to:

● The customer retention department, whose success is primarily measured in the percentage of customers they can retain ● The advertising department, whose success is primarily measured by impressions and “Click Through Rate” ● Commissioned sales agents, whose success is primarily measured by how many new clients they are able to sign up in a given period.

The issuer currently has limited models in place when targeting its advertising, and relies solely on a generalized assessment of the current clientele to determine future clientele. Ultimate success of the model will be measured by an even greater increase in revenue compared to an increase in advertising dollars spent.

4. Importing all necessary libraries

5. Loading the data

6. Exploratory Data Analysis (EDA)

The objective of this step is to dig deep into the data to discover more insights about the data. The data provided is exclusively for the previous 12 months of use. No data is available for user behaviour for any period before this. Because of this, the absolute vast majority of customers have a tenure of 12 months (the maximum). Because this is so uniform, it is left in for the model, in case the model is used in the future with a different tenure range. For ease of use, we chose to use python for the analysis, as the majority of the libraries were easy to work with

6.1 Checking for missing values

The data is fairly clean. We have a very limited number of missing values, and they are almost entirely confined to the one column of minimum payments. For our purposes, data types of “float64” and “int64” are interchangeable and are no cause for concern. The one “object” data is “CUST_ID” which will be subsequently addressed

The missing values are in two columns (MINIMUM_PAYMENTS) & (CREDIT_LIMIT)

8. Preperation

8.1 Part1:

We will use all features of the data except for the CUST_ID. CUST_ID is an arbitrary number assigned to each customer with the bank, and a higher or lower client number has no impact on the characteristics of the customer. The remainder of the data is all numeric. Since (almost) all missing values are concentrated in the “MINIMUM_PAYMENTS” column, and comprise less than 3.5% of the values in the column, they from this point are imputed from the median value.

8.2 Part 2:

Correlation between variables on each axis

Correlation Analysis - Spearman's Rank Correlation

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Spearman Correlation evaluates statistical relationship between two variables. We observe that we can’t confirm from the start which variables should be given the most consideration, saving time in analyzing variables that do not have a strong impact on the rate of the attrited customers variables that show a sizeable negative relationship in relative to the dependent attribute, and that are the target of investigation, are: Total_Trans_Ct, Total_Ct_Chng_Q4_Q1, Total_Revolving_Bal and Avg_Utilization_Ratio. A negative association (<0) indicates that the attribute has a relevant level of importance in the customer’s permanence.

8.3 Part 3 :

Visualizing data and checking for outliers

There will be some normalizing of the data via “Standard Scaler” to keep the validity of the outlier data. The use of a “RobustScaler” was also explored, so as to not bias the model towards any one particular feature, but still somewhat recognizes the characteristics of the outliers as valid. Compared to a MinMax scaler, the Robust Scaler is based on percentiles and is therefore not influenced by very large marginal outliers (Compare the effect of different scalers on data with outliers, n.d.). The entire model was tried with the Robust Scaler, but we found that it invalidated some of the data, and showed a propensity to bias the model towards the median. The resulting clusters were poorly representative of the actual customers.

8.4 Part 4:

Dealing with Outliers

By dropping outliers, we will lose tons of data as there are too many outliers in dataset. We even tried to remove the outliers, and because the dataset is heavily full of outliers, we observed losing up to 61% of the dataset. So, making ranges to deal with extreme values instead of removing the outliers.

8.5 Part 5:

Normalizing input values using Standard Scaler

we will normalize input values using standard scaler.There will be some normalizing of the data via “Standard Scaler” to keep the validity of the outlier data. The use of a “RobustScaler” was also explored, so as to not bias the model towards any one particular feature, but still somewhat recognizes the characteristics of the outliers as valid. Compared to a MinMax scaler, the Robust Scaler is based on percentiles and is therefore not influenced by very large marginal outliers (Compare the effect of different scalers on data with outliers, n.d.). The entire model was tried with the Robust Scaler, but we found that it invalidated some of the data, and showed a propensity to bias the model towards the median. The Standard Scaler is also the recommended scaler for K Means algorithms (discussed below). Since K Means uses a distance-based measurement to determine similarity between the data points, it is recommended to have a mean of zero and a standard deviation of one, since the features have different units of measurement. With this in mind, the Standard Scaler is the method we chose.

8.6 Part 6:

Clustering Using KMEANS

The objective of K Means clustering is quite easily to understand. It tries to group similar data points together to discover underlying patterns. To achieve this, we need to find the optimum number of “K” (also known as a centroid). It allocates every data point to the nearest cluster by reducing the in-cluster sum of squares. The number of clusters is only limited by the size of the dataset (eg. It makes no sense to have more clusters than you have data points).

In the code below 'b' is for blue and 'x' is for the x denoted in the elbow chart and '-' for the line to connect the 'x'.

the ideal number of clusters is found at the “elbow”. As shown in the graph to the left, our ideal amount of clusters is likely around 6.

Given the iterative nature of K-Means, it can be inconsistent. The end result is often the product of the initialization where the centroids are placed. It may get stuck in a local optimum, and not a global optimum. K-Means assumes a spherical shape of the clusters, and may give some overlap if the clusters are non-spherical. There is no intrinsic measure for uncertainty in any overlapping region; meaning overlap of clusters is likely. This is acceptable in our case as customers could potentially belong to either cluster on any given month; and as such, marketing strategies should be careful to not “pigeon hole” too strongly.

8.7 Part 7:

Clusters Interpretations:

Cluster0 People with average to high balance, make a lot of different kinds of purchases, average to high cash advanced, average to high credit limit

Cluster1 people with low balance, low purchase, low credit limit, take cash advanced more frequently

Cluster2 People with medium to high credit limit, spending less, spending more money on installments

Cluster3 People with high credit limit who take more cash in advance

Cluster4 People with high credit limit, highes purchasers, highest installment purchases, more cash advanced

Cluster5 People with average credit limit, most frequent cash in advance, not much spenders

Please note: (Clsuter number changes when re run)

8.8 Part 8:

Clusters Visualizations

Visualization using PCA to transform data to 2 dimensions

Hierarchical Clustering

Based on the silhouette scores, k=6 clusters is the ideal number of clusters that gives us the best model fit out of the 4 hierarchical models above.